-
Notifications
You must be signed in to change notification settings - Fork 2
Enabling RCCL and GEMM sweep test on GH action #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Sample for Gemm Sweep CI : Run Sample Run with aorta-report checkin : |
| paths: | ||
| - 'scripts/gemm_analysis/**' | ||
| - 'config/gemm_overlap/**' | ||
| #pull_request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this code is not required, then please delete these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are keeping this for now as we are still testing the yml locally, and in near future we may need to do this testing. But in production, we do not want to kick in this yml for every PR. Hence commented. Code is there for local testing on branch, but not enable for production.
| gemm-sweep: | ||
| name: Run GEMM Sweep Profiling | ||
| runs-on: [self-hosted, gpu, rocm] | ||
| runs-on: self-hosted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope you are not planning to keep the self-hosted in the final merge. Please change it to the runner machine name once you get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I am keeping this self-hosted as we have not got the runner.
| working-directory: docker | ||
| working-directory: aorta | ||
| run: | | ||
| #mkdir -p ~/.docker/cli-plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docker/rccl_test/docker-compose.yaml
Outdated
|
|
||
| volumes: | ||
| - /home/manrao:/manrao | ||
| - /home/oyazdanb/aorta:/workspace/aorta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct? why are we mapping Sonbol's area?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not used, later we should remove this file.
| path: aorta/${{ env.SWEEP_DIR }}/ | ||
| retention-days: 90 | ||
| git config user.name "GitHub Actions Bot" | ||
| git config user.email "<>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it allowed to keep email empty and still able to push to github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is working .. And thats how we are doing it in other shark test suites.
| working-directory: aorta-report | ||
| run: | | ||
| git config user.name "GitHub Actions Bot" | ||
| git config user.email "<>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
amd-vivekag
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Some comments and TODOs are pending which can be resolved in next PR
This is a temporary change to install local amd gpu installation instead of fetching it from amd cdn